Monte Carlo Analysis

 

Monte Carlo Simulation is a statistical technique for stochastic model-calculations and analysis of error propagation in calculations. Its purpose is to trace out the structure of the distributions of model output. In it's simplest form this distribution is mapped by calculating the deterministic results (realizations) for a large number of random draws from the individual distribution functions of input data and parameters of the model. To reduce the required number of model runs needed to get sufficient information about the distribution in the outcome (mainly to save computation time), advanced sampling methods have been designed such as Latin Hyper Cube sampling. The latter makes use of stratification in the sampling of individual parameters; like in random Monte Carlo sampling, pre-existing information about correlations between input variables can be incorporated. Monte Carlo analysis requires the analyst to specify probability distributions of all inputs and parameters, and the correlations between them. Both probability distributions and correlations are usually poorly known.
 
A number of software packages are available to do Monte Carlo analysis. Widely used are the commercial packages @Risk (http://www.palisade.com) and Crystal Ball (http://www.oracle.com/crystalball). Both are packages that are designed as fully integrated MS-Excel add-in programs with its own toolbar and menus. These packages can be used with minimal knowledge on the sampling and calculations techniques itself, which makes Monte Carlo Assessment easy (but tricky because it allows incompetent use). Another commercial package is Analytica (http://www.lumina.com), which is a quantitative modelling environment with built-in Monte Carlo algorithms.
 
If your model is not built in Excel you can use the SimLab package, which is freely available from the JRC (http://simlab.jrc.ec.europa.eu/). SimLab can also be interfaced with Excel, but this requires some programming skills. For the UNIX and MS-Dos environments you can use the UNSCAM (Janssen et al., 1994) software tool. RIVM is presently developing a new tool for Monte Carlo analysis, USATOOL, which will run under Windows.
 
Additionally most Monte Carlo analysis software offers the possibility to determine the relative contribution of uncertainty in each parameter to the uncertainty in a model output, e.g. by sensitivity charts, and can be used for a sophisticated analysis of trends in the presence of uncertainty.
 
Sorts and locations of uncertainty addressed
Monte Carlo analysis typically addresses statistical uncertainty (stochastic inexactness) in inputs and parameters. Although it is rarely used this way, it is possible to use Monte Carlo analysis also for assessing model structure uncertainty, by introducing one or more “switch parameter” to switch between different model structures with probabilities attached for each position of the switch.
Two-dimensional Monte Carlo Analysis allows for a separate treatment of knowledge related uncertainty and variability related uncertainty (see below under guidance for application). In this two-dimensional mode, Monte Carlo Analysis provides some insight in the quality of the knowledge base. It does not address issues of value loading.
 
Required resources
Computer
Monte Carlo software packages such as Crystal Ball and @Risk run on standard PCs with Pentium II, 200 MHz or faster, 32 MB RAM as officially recommended minimum configuration. On the basis of our experiences we recommend 500 Mhz processor or equivalent with 256 MB RAM as minimum configuration.
 
Training
Packages such as Crystal Ball are very easy to learn. If you are familiar with Excel it takes less than one hour to get proficient with Crystal Ball.
 
SimLab takes more time to get proficient with and requires more skills because one has to interface SimLab with one's own model. The forum on the SimLab website has however al lot of useful tips making the task easier. We recommend the book “Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models” by Saltelli et al. 2004. It has many practical examples of the use of SimLab.
 
Strengths and limitations
Typical strengths of Monte Carlo simulation
  • Provides comprehensive insight in how specified uncertainty in inputs propagates through a model.
  • Forces analysts to explicitly consider uncertainty and interdependencies among different inputs.
  • Is capable to cope with any conceivable shape of PDF and can account for correlations.
  • Can be used in 2-dimensional mode to separately assess variability and epistemological uncertainty. 
Typical weaknesses of Monte Carlo simulation
  • Monte Carlo assessment is limited to those uncertainties that can be quantified and expressed as probabilities.
  • One may not have any reasonable basis on which to ascribe a parameterised probability distribution to parameters
  • May take large run-time for computational intensive models. This can partly be remedied by using more efficient sampling techniques (e.g. Latin Hypercube Sampling).
  • The interpretation of a probability distribution of the model output by decision makers is not always straightforward; there is no single rule arising out of such a distribution that can guide decision-makers concerning the acceptable balance between for instance expected return and the variance of that return. 
Guidance on application
In their report "Guiding Principles for Monte Carlo Analysis" (EPA, 1997) the EPA presents 16 good practice guidelines for doing Monte Carlo assessment. These guidelines are (we have modified the phrasing slightly to keep terminology consistent within the guidance documents):
 
Selecting Input Data and Distributions for Use in Monte Carlo Analysis
 
1. Conduct preliminary sensitivity analyses or numerical experiments to identify model structures, model input assumptions and parameters that make important contributions to the assessment and its overall uncertainty.
 
2. Restrict the use of probabilistic assessment to significant parameters.
 
3. Use data to inform the choice of input distributions for model parameters.
  • Is there any mechanistic basis for choosing a distributional family?
  • Is the shape of the distribution likely to be dictated by physical or biological properties or other mechanisms?
  • Is the variable discrete or continuous?
  • What are the bounds of the variable?
  • Is the distribution skewed or symmetric?
  • If the distribution is thought to be skewed, in which direction?
  • What other aspects of the shape of the distribution are known?
4. Proxy data can be used to develop distributions when they can be appropriately justified.
 
5. When obtaining empirical data to develop input distributions for model parameters, the basic tenets of environmental sampling should be followed. Further, particular attention should be given to the quality of information at the tails of the distributions.
 
6. Depending on the objectives of the assessment and the availability of empirical data to estimate PDFs, expert elicitation can be applied to draft probability density functions. When expert judgment is employed, the analyst should be very explicit about its use.
 
Evaluating variability and knowledge limitations
 
7. It is useful to distinguish between uncertainty stemming from intrinsic variability and heterogeneity of the parameters on the one hand and uncertainty stemming from knowledge limitations on the other hand. Try to separate them in the analysis where possible to provide greater accountability and transparency. The decision about how to track them separately can only be made on a case-by-case basis for each variable.
 
8. Two dimensional Monte Carlo techniques allow for the separate treatment of variability and epistemological uncertainty. There are methodological differences regarding how uncertainty stemming from variability and uncertainty stemming from knowledge limitations are addressed in a Monte Carlo analysis.
  • Variability depends on the averaging time, averaging space, or other dimensions in which the data are aggregated.
  • Standard data analysis tends to understate uncertainty from knowledge limitations by focusing solely on random error within a data set. Conversely, standard data analysis tends to overstate variability by implicitly including measurement errors.
  • Various types of model errors can represent important sources of uncertainty. Alternative conceptual or mathematical models are a potentially important source of uncertainty. A major threat to the accuracy of a variability analysis is a lack of representativeness of the data. 
9. Methods should investigate the numerical stability of the moments and the tails of the distributions.
  • Data gathering efforts should be structured to provide adequate coverage at the tails of the input distributions.
  • The assessment should include a narrative and qualitative discussion of the quality of information at the tails of the input distributions. 
10. There are limits to the assessor's ability to account for and characterize all sources of uncertainty. The analyst should identify areas of uncertainty and include them in the analysis, either quantitatively or qualitatively.
 
Presenting the Results of a Monte Carlo Analysis
 
11. Provide a complete and thorough description of the model or calculation scheme and its equations, including a discussion of the limitations of the methods and the results.
 
12. Provide detailed information on the input distributions selected. This information should identify whether the input represents largely variability, largely uncertainty, or some combination of both. Further, information on goodness-of-fit statistics should be discussed.
 
A PDF plot is useful for displaying:
  • The relative probability of values;
  • The most likely values (e. g., modes);
  • The shape of the distribution (e. g., skewness, kurtosis); and
  • Small changes in probability density. 
A CDF plot is good for displaying:
  • Fractiles, including the median;
  • Probability intervals, including confidence intervals;
  • Stochastic dominance; and
  • Mixed, continuous, and discrete distributions. 
13. Provide detailed information and graphs for each output distribution.
 
14. Discuss the presence or absence of dependencies and correlations.
 
15. Calculate and present point estimates.
 
16. A progressive disclosure of information style in presentation, in which briefing materials are assembled at various levels of detail, may be helpful. Presentations should be tailored to address the questions and information needs of the audience.
 
  • Avoid excessively complicated graphs. Keep graphs intended for a glance (e. g., overhead or slide presentations) relatively simple and uncluttered. Graphs intended for publication can include more complexity.
  • Avoid perspective charts (3-dimensional bar and pie charts, ribbon charts), pseudo-perspective charts (2-dimensional bar or line charts).
  • Color and shading can create visual biases and are very difficult to use effectively. Use color or shading only when necessary and then, only very carefully. Consult references on the use of color and shading in graphics.
  • When possible in publications and reports, graphs should be accompanied by a table of the relevant data.
  • If probability density or cumulative probability plots are presented, present both, with one above the other on the same page, with identical horizontal scales and with the location of the mean clearly indicated on both curves with a solid point.
  • Do not depend on the audience to correctly interpret any visual display of data. Always provide a narrative in the report interpreting the important aspects of the graph.
  • Descriptive statistics and box plots generally serve the less technically oriented audience well. Probability density and cumulative probability plots are generally more meaningful to risk assessors and uncertainty analysts. 
For a full discussion of these 16 guidelines we refer to the EPA report (EPA, 1997).
 
The EPA report also gives some guidance on the issue of constructing adequate probability density functions using proxy data, fitting distributions, using default distributions and using subjective distributions. Important questions in this process are:
 
  • Is there Prior Knowledge about Mechanisms?
  • Are the proxy data of acceptable quality and representativeness to support reliable estimates?
  • What uncertainties and biases are likely to be introduced by using proxy data?
  • How are the biases likely to affect the analysis and can the biases be corrected? 
In identifying plausible distributions to represent variability, the following characteristics of the variable should be taken into account:
  • Nature of the variable (discrete or continuous)
  • Physical or plausible range of the variable (e. g., takes on only positive values)
  • Symmetry of the Distribution. (E.g. is the shape of the distribution likely to be dictated by physical/ biological properties such as logistic growth rates)
  • Summary Statistics (Frequently, knowledge on ranges can be used to eliminate inappropriate distributions; If the coefficient of variation is near 1.0, then an exponential distribution might be appropriate etc.) 
Pitfalls
Typical pitfalls of Monte Carlo Analysis are:
  • Forgetting that Monte Carlo analysis takes the model structure and boundaries for granted
  • Ignoring correlations
  • Hyper precision: Often the PDFs on the inputs used have the status of educated guesses. The output produced by the software packages usually come out the computer with a high number of digits, which are certainly not significant. Also the shapes of the input distributions are usually not well known, therefore one should not attribute too much meaning to the precise shape of the distribution as it comes out of the calculation.
  • Glossy reports: Present day software packages for Monte Carlo Analysis can be used easily without requiring prior knowledge of Monte Carlo analysis or prior theoretical knowledge of probability distributions theory. The somewhat glossy results produced by the computer look very professional even if the experiment was poorly designed. We therefore recommend not using these packages without understanding the basics of probability distributions theory, correlations and Monte Carlo analysis. The handbooks that go with the software provide good primers on these issues. We particularly recommend the Crystal Ball handbook in this respect.
  • Note that several software packages for Monte Carlo Analysis (inter alia certain versions of SimLab, and Crystal Ball) give false results if Windows is configured to use a comma as decimal separator rather than a dot.

References
Handbooks:
EPA, Risk Assessment Forum, Guiding Principles for Monte Carlo Analysis, EPA/630/R-97/001, 1997.

Saltelli A, Ratto M, Andres T, Campolongo F, Cariboni J, Gatelli D, Saisana M and Tarantola S 2008 Global Sensitivity Analysis:
The Primer
(Chichester: Wiley)
 
M.G. Morgan and M. Henrion, Uncertainty, A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis, Cambridge University Press, 1990.
 
Crystal Ball 2000, User Manual Decision Engineering Inc., Denver, 2000.
 
Palisade Corporation (2000): Guide to Using @RISK - Risk Analysis and Simulation Add-in for Microsoft Excel, Version 4, March 2000.
 
Andrea Saltelli, Karen Chan, Marian Scott, Sensitivity Analysis John Wiley & Sons publishers, Probability and Statistics series, 2000.
 
Andrea Saltelli, Stefano Tarantola, Francesca Campolongo, Marco Ratto, Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models, John Wiley & Sons publishers, 2004
 
Vose D. (2000): Risk Analysis – A quantitative guide, 2nd edition. John Wiley & Sons, Ltd. Chichester.
 
 
Papers and reports
Burmaster, D.E., and Anderson, P.D.: Principles of Good Practice for the Use of Monte Carlo Techniques in Human Health and Ecological Risk Assessments, Risk Analysis, Vol. 14, No. 4, 1994.
 
IPCC, Good Practice Guidance and Uncertainty Management in National Greenhouse Gas Inventories, IPCC, 2000.
 
P.H.M. Janssen, P.S.C. Heuberger, & R.A. Sanders. 1994. UNCSAM: a tool for automating sensitivity and uncertainty analysis. Environmental Software 9:1-11.